04:00
2026-06-24
arxiv.org
large-language-models
Do LLM Attribution Metrics Transfer? Auditing Retrieval-Augmented Generation Evaluation Across Datasets and Constructs
A new study auditing eight automatic attribution metrics across multiple datasets finds that no single metric consistently performs best, with rankings inverting across datasets (Kendall tau = -0.64).โฆ